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Abstract 

Barwise  and  Seligman  proposed  a  very  general  qualitative  theory  of  information  flow  (in  dis¬ 
tributed  systems)  while  Shannon  proposed  a  very  general  quantitative  theory  for  communication 
flow.  The  two  kinds  of  flow  are  not  synonymous,  with  information  flow  being  the  more  general  of 
the  two.  We  synthesize  a  new  theory  from  these  two  theories  so  that  the  qualitative  and  quantitative 
analysis  use  the  same  theory  structures.  The  main  advantages  are  (1)  Shannon  theory  gets  a  more 
expressive  framework  within  which  to  operate,  (2)  Barwise/Seligman  theory  gets  to  take  advantage  of 
quantitative  mechanisms.  The  resultant  theory  has  direct  applications  to  steganography  and  covert 
channels  although  the  development  of  these  applications  will  appear  in  a  subsequent  paper. 


1  Introduction 

The  theory  presented  in  this  paper  rests  upon  two  particular  information  theories.  The  qualitative  theory 
by  Barwise  and  Seligman  [3]  is  known  colloquially  as  channel  theory.  The  quantitative  theory  by  Shannon 
[7]  is  colloquially  known  as  information  theory.  As  several  people  have  noticed  (e.g.,  [4]),  Shannon’s 
information  theory  would  be  better  called  communication  theory.  We  concur  and  the  term  information 
theory  will  be  used  in  the  sense  of  the  joint  qualitative/quantitative  theory  we  present  in  this  paper.  The 
main  difference  between  the  two  base  theories  is  how  they  view  channels.  The  Barwise/Seligman  notion 
of  information  channel  can  be  made  to  support  Shannon’s  notion  of  communication  channel  Conversely, 
Shannon’s  quantitative  methods  can  provide  measures  for  the  Barwise/Seligman  notion  of  channel. 

The  theory  presented  here  is  more  general  than  either  Shannon’s  or  Barwise/Seligman ’s  for  two 
reasons: 

•  Shannon  restricted  his  theory  to  communication  channels.  By  using  quantitative  measures  on 
information  channels,  Shannon’s  theory  is  made  more  inclusive  and  now  applies  to  this  more  general 
notion  of  channel. 

•  Barwise/Seligman’s  theory  ignored  quantitative  measures  in  favor  of  a  qualitative  theory.  We  make 
the  argument  that  their  qualitative  framework  can  guide  a  quantitative  theory  by  giving  the  theory 
a  more  expressive  scaffolding  upon  which  to  apply  quantitative  measures. 

In  [5],  it  is  pointed  out  that  the  notion  of  communication  channel  capacity  fails  to  capture  salient 
features  of  covert  and  steganographic  channels.  In  image  steganography,  information  is  hidden  in  a  cover 
image.  The  Shannon  analysis  of  this  situation  can  put  measures  on  the  amount  of  hidden  information 
the  communication  channel  will  support.  The  problem  is  that  the  amounts  calculated  may  have  little 
to  do  with  the  transfer  of  actual  information  because  the  information  has  a  qualitative  nature  to  it  not 
amenable  to  the  baseline  Shannon  analysis.  A  more  sophisticated  framework  is  required  upon  which  to 
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base  the  Shannon  analysis.  Our  information  theory  presented  in  this  paper  has  a  direct  application  to 
steganographic  analysis  in  particular  and  covert  channels  in  general. 


2  Classifications  and  Infomorphisms 

The  basic  unit  of  information  in  channel  theory  is  a  tuple  of  a  binary  relation.  The  relationship  is  between 
a  token  (a  piece  of  data,  say,  as  in  Shannon  theory)  and  a  type  (what  kind  of  thing  is  this  data  of).  This 
is  represented  as 

x\=P 

where  x  is  the  piece  of  data,  \=  is  the  relation,  and  P  is  the  type.  The  symbol,  1=,  is  the  usual  semantic 
symbol  of  logic  and  is  usually  interpreted  in  logic  as  “x  satisfies  P”.  This  paper  will  treat  |=  as  the 
relation  “x  is  of  type  P” .  There  is  to  be  no  metaphysical  or  epistemological  baggage  to  be  associated 
with  “x  is  of  type  F'  even  though  we  sometimes  use  the  verb  “satisfy”  when  talking  about  |=.  Also, 
one  cannot  express  any  property  about  a  single  token  unless  the  property  is  reified  as  a  type  and  the 
expression  is  via  the  [=  relation.  Hence,  for  a  number  x  as  a  token,  one  can  only  express  its  value  V  by 
an  expression  of  the  form  x  V.  In  this  sense,  channel  theory  enforces  a  discipline  that  is  sometimes 
lacking  in  analysis  of  information. 

To  relate  the  description  in  the  preceding  paragraph  to  Shannon’s  theory  will  take  most  of  the  work 
done  in  the  sequel.  However,  to  help  orient  a  reader  versed  in  Shannon’s  theory,  we  offer  here  this 
description.  The  basic  unit  of  information  in  Shannon’s  theory  is  also  a  tuple  of  a  binary  relation.  The 
relation  is  restricted  to  be  of  the  form  x  \=  V  where  [=  is  a  function  and  V  is  value  of  the  token  x. 
The  resulting  structure  is  typically  called  a  state  space  where  V*  is  a  state  and  the  tokens  are  forgotten. 
Channel  theory  also  has  state  spaces  except  the  tokens  are  not  forgotten  and  types  are  values.  States 
are  sometimes  further  collected  together  to  form  events.  Channel  theory  allows  this  also  by  first  keeping 
the  tokens  and  then  replacing  the  states  as  types  with  events  as  types.  For  some  event  E,  x  E 
just  when  x»|=  s  for  some  state  s  £  E.  Hence,  Shannon’s  basic  ontology  is  neatly  embedded  in  channel 
theory’s  ontology  with  channel  theory  being  somewhat  more  rigorous  about  the  specification  of  the  entities 
involved. 

A  collection  of  types  and  tokens  with  their  relation  is  known  as  a  classification.  A  more  telling  term 
might  be  universe  of  discourse  and  one  can  freely  interchange  the  two  terms.  A  classification  is  just  what 
you  thought  it  was,  it  is  a  collection  of  things  which  have  the  form  of  “x  is  a  P”,  or  in  our  parlance,  “x 
is  of  type  P”,  i.e.,  x  |=  P.  Information  can  flow  between  two  classifications  via  an  infomorphism  which  is 
a  special  pair  of  contravariant  maps  between  classifications,  one  for  tokens  and  one  for  types.  When  the 
information  flow  between  two  classifications  is  of  such  complexity  that  it  cannot  be  adequately  expressed 
using  a  single  infomorphism,  the  flow  can  be  re-expressed  as  a  channel.  A  channel  is  another  classification 
which  is  connected  to  the  original  two  classifications  via  infomorphisms. 

2*1  Classifications 

Definition  2.1*1  (Barwise-Seligman)  A  classification,  A,  is  a  pair  of  sets  and  a  relation.  The 
sets  are  called,  respectively,  the  tokens,  Tok{A),  and  types,  Typ{A).  The  binary  relation,  usually 
symbolized  by  \=,  is  between  the  two  sets,  i.e.,  \=a^  Tok{A)  x  Typ{A).  The  term  x  \=a  P  means 
(x,  P)  €\=a  with  X  G  Tok{A)  and  P  G  Typ{A). 
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Typ{A) 


Classification  A 


Tok{A) 

It  is  convenient  to  talk  about  all  of  the  tokens  satisfying  a  single  type  or  all  of  the  types  satisfying  a 
particular  token.  The  following  definition  relativizes  Typ{—)  and  Tok{—)  to  a  particular  classification. 

Definition  2.1.2  Let  A  =  {Tok{A),Typ{A),  |=^)  be  a  classification,  then  for  any  P  €  Typ{A),Tok{P)  = 
{y\y\=A  P}  and,  for  any  x  e  Tok{A),  Typ{x)  =z{Q  \x\=a  Q}’ 

Example  2.1.3  Let  FOL  =  {Models,  Sentences,  \=jroL)  where  Sentences  are  sentences  in  first  order 
logic  (FOL),  Models  are  models  of  first  order  sentences,  and  x  \=fol  S'  iff  x  is  a  model  of  the  sentence 
S.  Notice  there  are  a  number  of  internal  relations  that  hold  of  the  set  of  sentences  and  the  set  of  models. 
However,  none  of  these  relations  are  imposed  as  external  conditions  in  this  example.  The  example  could 
be  pumped  up  to  include  them.  One  could  also  flip  this  example  so  that  the  types  were  Models  and 
the  tokens  were  Sentences,  in  which  case,  Sentences  would  be  classified  by  Models  rather  than  Models 
classified  by  Sentences. 

Example  2.1.4  Let  T  =  {Points,  Opens,  |=r)  where  Points  are  the  points  of  a  topological  space,  Opens 
are  the  set  of  open  sets  of  that  space,  and  x  |=x  O  iff  x  €  O.  This  classifies  points  by  the  open  sets  in 
which  they  are  contained.  By  reversing  the  j=r,  one  could  classify  the  opens  by  the  points.  The  set  of 
open  sets  forms  a  Heyting  lattice,  but  that  is  not  specified  in  this  classification  and  hence  no  use  of  this 
classification  within  information  theory  can  make  use  of  that  fact.  It  could,  however,  be  imposed  on  the 
classification  from  the  outside. 

Example  2.1.5  Let  M  =  {Messages,  Contents,  \=m)  where  Messages  are  classified  by  their  contents. 
One  could  use  an  entire  theory  of  content  in  conjunction  with  the  set  of  types,  the  theory  would  have 
much  internal  structure.  This  internal  structure  is  not  required  by  channel  theory,  but  it  could  be  imposed 
or  stipulated  if  needed. 

Example  2.1.6  Let  D  =  {Times,  {0, 1},  |=£>)  where  Times  is  a  set  of  discrete  time  stamps,  and  t  1 
iff  some  message  was  sent  at  time  t  and  t  \=£)  0  iff  no  message  was  sent  at  time  t.  Also,  t  1 
does  not  automatically  imply  t  \=n  0;  there  is  nothing  within  channel  theory  to  force  this  condition. 
The  situation  described  by  this  classification  might  be  such  that  there  is  incomplete  information  about 
whether  a  message  has  or  has  not  been  sent.  You  may,  however,  stipulate  (from  the  outside)  such  a 
constraint  within  the  classification.  The  distinction  here  from  the  previous  example  is  that  with  respect 
to  communication  channels,  sometimes  it  is  not  the  messages  themselves  that  are  to  be  modeled  but 
rather  information  about  the  messages. 

2.2  Infomorphisms 

The  “flow”  of  information  flow  is  rarely  qualified  in  many  theories  of  information  flow  although  it  is 
frequently  quantified  as  data  flow.  Since  the  currency  of  information  is  the  tuple  “a:  is  of  type  P”,  to 


•P 


\=A 


The  diagram  only  indicates  that 
a  6  Tok{A)  and  P  €  Typ{A),  not 
that  a  \=A  P* 
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translate  information  (where  here  we  are  using  “translate”  in  its  sense  as  a  preservation  mapping),  one 
first  thinks  to  translate  the  x  to  a  t/  and  the  Pto&Q.  This  turns  out  not  to  be  in  accord  with  most  uses 
of  classifications  within  mathematics  and  logic.  More  to  the  point,  the  morphisms  of  classifications  must 
relate  tokens  and  types  of  two  classifications  in  a  special  way,  not  simply  translate  token-type  tuples  to 
token-type  tuples.  The  reason  for  this  is  that  the  “flow”  of  information  flow  is  a  flow  of  logical  reasoning, 
not  a  flow  of  the  currency. 

Definition  2.2.1  (Barwise-Seligman)  Assume  classifications  A  =  {Tok{A),Typ{A),^A)  and^B  = 
(Tok{B),Typ{B),  |^b)-  An  infomorphism  h  :  A  — ►  B  is  a  pair  of  contravariant  maps,  h  and  h  such 
that  lx  :  Typ{A)  — >  Typ{B)  and  X  :  Tok{B)  — »  Tok{A),  and  for  all  p  and  Q,  the  following  condition 
is  satisfied: 

/  \=A  Q  iffp  Nb 

where  for  ease  of  presentation,  h  (p)  is  displayed  as  p^  and  h  (Q)  as  Q^. 

This  can  be  pictured  with: 


Typ{A) 


Typ{B) 


\=A  p'*  \=AQ  lfip  Hb  Q'^  I=b 


Tok{B) 

The  infomorphism  h  above  is  (by  convention)  a  morphism  from  the  classification  A  to  the  classification 
B.  Note  that  this  is  not  a  commutative  diagram,  the  |=^  and  \=b  hhes  are  not  arrows  or  maps.  They 
merely  indicate  binary  relations. 

Example  2.2.2  Let  SET  =  {Models,  Sentences,  ^set)  where  Sentences  are  sentences  of  set  theory  in 
the  language  of  FOL,  and  Models  are  models  of  set  theory.  Let  NUM  —  {Sentences,  Models,  \=nvm) 
where  Sentences  are  sentences  of  number  theory  in  the  language  of  FOL  and  Models  are  models  of 
number  theory.  An  infomorphism  h  :  NUM  — >  SET  might  describe  number  theory  as  a  part  of  set 
theory,  i.e.,  translate  every  sentence  in  number  theory  into  an  equivalent  sentence  in  set  theory.  The 
models  map  goes  in  the  opposite  direction,  every  model  of  set  theory  provides  a  model  of  number  theory. 
Let  m  be  model  of  set  theory  and  P  some  statement  of  number  theory,  then 

\^NUM  P  iff  m  \=SET 

says  that  is  model  of  a  sentence  P  in  number  theory  iff  m  itself  is  a  model  of  the  interpretation  of  P, 
namely  P^,  in  set  theory. 

Example  2.2.3  Let  T  and  T'  both  be  topological  classifications,  then  a  map  /  :  Tok{T)  — >  Tok{T^)  is 
continuous  just  when  /’"Ms  a  map  from  Typ{T)  to  Typ{T).  The  pair  f,  constitutes  an  infomorphism 
from  T'  to  T.  For  any  point  x  and  open  set  O: 

f=r'  O  iff  X  [=r 
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simply  because 


f{x)€0  iffxer\0). 


Example  2.2.4  Assume  there  are  message  classifications  M  =  {MessageSjContentSy  ]^m)  and  M'  = 
{Messages'  y  Contents'  y  1=M')*  An  infomorphism  h  :  M  — M'  might  model  a  function  changing  mes¬ 
sages  from  M'  to  messages  in  M  such  that  what  can  be  said  about  the  translation  can  be  mapped  into 
something  that  can  be  said  about  the  original  message: 

\=M  C  iff  m  \=M'  C’' 

Here,  the  translation  is  working  distinctly  opposite  from  that  of  number  theory  into  set  theory. 

Example  2.2.5  Let  D  =  (Krnes,  {0, 1},  |=jo)  and  D'  =  (Times,  {0, 1},  [=£)/)  be  two  discrete  time 
classifications  of  messages.  An  infomorphism  h  :  D  — >  D'y  defined  as  t^  ~  t'  where  f  means  the  next 
time  step  after  t  and  =  A/”  for  A"  €  {0, 1},  models  sending  messages  at  one  time  interval  and  their 
reception  at  the  next. 

t^  \=D  1  ^  HjD'  f  ^  1=I>  0  t  0^ 

says  that  a  message  associated  with  time  t^  is  received  iff  a  message  associated  with  time  t  is  sent,  and 
that  no  message  is  associated  with  t^  iff  no  message  was  sent  at  time  t.  Notice  that  there  is  no  mention 
that  a  message  sent  must  be  the  same  as  a  message  received.  Again,  this  is  something  external  to  be 
stipulated.  One  could  easily  change  the  tokens  to  include  the  actual  messages  in  order  to  accommodate 
this  restriction.  In  this  example,  the  communication  channel  is  modeled  as  an  infomorphism.  For  more 
complicated  communication  channels,  this  will  not  be  sufficient  and  the  communication  channel  will  be 
modeled  as  another  classification. 


3  Classifications  and  Probability 

3.1  State  Spaces  and  Event  Classifications 

Definition  3.1.1  (Barwise-Seligman)  A  state  space,  S  =  {Tok{S),Typ{S)y  states),  Is  a  classifica¬ 
tion  where  states  *  Tok{S)  — »  Typ{S)  is  a  function,  i.e.,  each  token  is  of  unique  type. 

The  tokens  are  typically  abstractions  of  the  system.  One  might  view  the  tokens  as  snapshots  of  the 
system  at  various  times.  The  types  are  typically  vectors  of  values  of  the  system  variables.  One  could 
reify  the  tokens  as  vectors  of  system  variable  values  at  a  specified  times.  In  this  case,  the  state  function 
merely  strips  off  the  time  value  yielding  the  vector  representing  the  system  state.  The  reason  states  is  a 
function  is  that  a  system  can  be  in  only  one  state  at  a  time. 

Definition  3.1.2  (Barwise-Seligman)  A  state  space  morphism,  /  :  ^  S^,  is  a  pair  of  maps 

f  :  Tofc(5i)  — >  Tok{S2)  and  f  :  Typ{S^)  — >  Typ{S:i)  such  that 

states:,{x^)  =  {states^{x))^ . 

Note  both  maps  run  in  the  same  (covariant)  direction  in  contradistinction  to  infomorphisms  which 
run  in  opposing  (contravariant)  directions.  Typically,  state  space  analyses  totally  ignore  the  notion  of 
token  and  only  the  states  are  deemed  important.  However,  this  is  not  sensitive  enough  for  a  qualitative 
theory  where  a  state  may  arise  for  two  entirely  different  reasons.  Also,  a  token  is  typically  not  the  system 
itself.  Were  that  the  case,  there  would  be  only  one  value  or  type.  A  token  must  be  more  of  an  abstract 
notion  of  the  system  at  a  particular  time  or  place. 
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Definition  3.1.3  (Barwise-Seligman)  The  event  classification,  Evt{S),  associated  with  a  state 
space  S  has  tokens  Tok(S)  and  types  Typ(Evt(S)}  =  T(Typ(S))  where  P(Typ(S))  is  the  power  set 
ofTyp(S). 

This  definition  could  be  weakened  by  requiring  only  Typ{Evt(S))  C  V{Typ{S)),  but  it  is  handy  for  Evt 
to  identify  a  particular  event  space. 

Definition  3.1.4  (Barwise-Seligman)  Given  the  state  space  morphism  f  :  S3,  the  event 

space  morphism  Evt{f)  :  Evt{Si)  — >  EvtiSa)  is  a  pair  of  contravariant  maps  where  Evt{f)  =  f  and 

Evt{f)  =  y~\ 

3.2  Probability  Spaces 

The  notion  of  probability  adheres  to  the  types  of  a  classification.  Let  A  =  {Tok{A),Typ{A),  |=yi)  be  a 
classification.  For  E  a  type,  ^{E)  is  the  probability  assigned  to  the  type  E,  and  is  thought  of  as  the 
probability  that  any  token  is  of  type  E.  The  probability  is  taken  with  respect  to  the  entire  set  of  tokens. 
Notice  that  this  is  distinctly  different  than  the  notion  of  a  particular  x  being  of  type  E  with  a  probability 
or  confidence  level  of  p.  This  latter  might  be  symbolized  with  x\=^E  and  represents  the  idea  that 
(in  this  instance)  is  not  a  concrete  relation  or  a  relation  we  have  concrete  information  about.  This  has  a 
distinctly  Bayesian  tinge  to  it  and,  although  intriguing,  we  will  not  consider  it  here. 

Shannon  theory  always  works  at  the  level  of  types.  The  reason  Shannon  theory  works  at  the  level  of 
types  is  because  it  is  working  with  average  amounts  of  information,  not  specific  pieces  of  information. 
Typically,  probability  theory  would  force  the  assumption  that  the  collection  of  types  be  a  Borel  algebra. 
A  probability  function  ^  is  then  a  monotone  map  from  this  Borel  algebra  to  the  set  of  real  numbers 
(0, 1],  i.e.,  for  x,  y  members  of  a  Borel  algebra  upon  which  ^  is  defined,  x  <y  implies  ^{x)  <  ^{y)- 
There  is  no  structure  imposed  by  channel  theory  upon  any  set  of  types  although  there  is  an  induced 
preorder.  It  is  this  preorder  we  take  advantage  of  when  defining  probability  functions  for  classifications. 

Definition  3.2.1  Given  a  classification  A,  the  token  induced  preorder  on  Typ(A)  is  defined  with 

P<Q  iffTok{P)  C  Tok{Q). 
and  the  token  induced  partial  order  on  Typ(A)  is  defined  with 

P  iff  P  or  Tok{P)  =  Tok{Q). 

The  partial  order  d  is  essentially  -<  divided  out  by  any  symmetries  induced  by  equalities  of  the  form 
Tok{P)  =  Tok{Q)  yielding  P  and  Q  ^  P.  The  reason  to  define  as  a  preorder  instead  of  promoting 
it  to  a  partial  order  is  because  the  collection  of  types  might  have  a  very  intensional  description  and  this 
would  be  lost  if  -<  collapsed  types  based  on  the  extensional  nature  of  sets  alone. 

Definition  3.2.2  Given  a  classihcation  A,  a  set  F  C  Typ{A)  is  called  disjoint  just  when  for  any  two 
types  P,Q  eT,  Tok{P)  fl Tok{Q)  =  0. 

Definition  3.2.3  An  abstract  probability  space  is  a  classification  A  together  with  a  probability 
function  :  Typ{A)  — ►  Reals  satisfying  (for  types  P  and  Q): 

(^1)  for  any  countable  set  T  of  disjoint  types  with  members  Pi, 

|r| 


(3^2)  ifTok{P)  =  Tok{A)  then  ^{P)  =  1; 

(^3)  for  any  countable  set  F  of  disjoint  types  with  members  Pi, 

[Vi  (1  <  i  <  |r|  implies  Pi  -<  Q)]  implies  ^  ^{Pi)  <  ^{Q)] 

(^4)  for  any  countable  set  F  of  disjoint  types  with  members  Pi, 

TokiQ)  C  y(rofc(Pi)  I  Pi  e  r)  implies  ^(Q)  <  ^  .^(P); 

1=1 

(^5)  Tok(P)  =  0  implies  ^(P)  =  0. 

The  axiom  (^2)  is  different  than  in  Kolmogorov’s  axioms  for  the  simple  reason  that  there  need  not 
be  a  type  which  all  tokens  satisfy.  One  can  always  adjoin  a  type,  [7,  to  the  types  (of  a  classification)  such 
that  ail  tokens  satisfy  U  and  it  will  not  affect  the  exposition  here.  The  third  axiom  implies  that  ^  is 
monotone  and  hence  P  <Q  and  Q  <  P  imply  ^{P)  =  The  third  and  the  forth  axioms  will  force 

the  Kolmogorov  axiom 

|r| 

^(V{P|Pier}):^5]^(P) 

i  1=1 

to  be  true  if  the  collection  of  types  is  a  Borel  algebra.  The  last  axiom  is  an  abstraction  of  the  situation 
where  event  spaces  are  generated  from  state  spaces  and  the  observation  that  if  a  state  cannot  occur,  i.e., 
it  has  no  tokens,  then  it  must  have  probability  of  0.  Events  are  collections  of  states,  so  if  the  states  of  an 
event  have  no  tokens,  the  event  has  probability  0. 

Theorem  3.2.4  If  A  is  an  abstract  probability  space  when  Typ{A)  under  the  token  induced  ■;<  partial 
order  has  a  Borel  lattice  structure,  then  A  is  a  probability  space. 

proof:  The  Kolmogorov  axioms  for  a  probability  space  are  easily  seen  to  be  true  under  these  conditions,  i 

The  -<  order  is  a  remnant  of  the  Boolean  lattice  order  of  a  Borel  algebra.  In  fact,  if  the  classification 
does  arise  as  an  event  space  from  a  state  space,  the  :<  order  is  very  nearly  the  C  order  on  the  event  space. 
The  only  difference  is  that  the  event  space  definition  does  not  require  a  state  have  tokens  yet  the  :<  order 
is  defined  entirely  in  terms  of  tokens.  If  this  is  the  case,  i.e.,  every  state  of  every  event  has  tokens,  then 
the  two  orders  are  isomorphic.  In  any  case,  with  the  exclusion  of  the  last  axiom,  every  probability  space 
defined  on  an  event  space  is  an  abstract  probability  space  since  the  first  three  axioms  are  easily  satisfied 
and,  since  U  is  the  least  upper  bound,  the  forth  axiom  is  satisfied. 

Theorem  3.2.5  Let  £  =  (Typ(Ei;t(5)),  f),  [J?  ±)  be  the  complete  Boolean  lattice  of  sets  where 

Evt{S)  is  the  event  space  defined  from  a  state  space  S  and  T  =  Typ{S)  and  X  =  0.  Let 

T{£)  —  {u  I  u  =  Tok{P)  and  P  €  Typ{Evt{S))}. 

Then  T  =  {T {£),  P|,  IJ,  — ,  Tok{S),  0)  is  a  complete  Boolean  lattice  of  token  sets.  The  function  f  :  £  — ^  T 
where 

f{P)  =  Tok{P)  =  {x  I  X  1=5  s  and  s  £  P} 

is  a  lattice  epimorphism  sending  T  to  Tok{S)  and  X  to  0.  IfTok{s)  ^  0  for  all  s  £  Typ{S)  then  Tok{—) 
is  also  1-1. 
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proof:  Let  F  C  Typ{Evt{S))  then 

X  e  Tok{\J{P  I  P  6  r})  iff  (35  e  (J{P  I  P  €  T})[x  \=s  5] 

iff  (35  )(3P  e  r  )[x  \=s  s  and  s&  P] 

iffi3PeT)lxeTok{P)] 

iffxG\J{Tok{P)\P&r} 

and 

X  e  rofc(n{^’  I  e  r})  iff  (35  e  f|{p  I  p  G  r})[x  ^  «] 

iff  (35  VP  e  r  )[x  [=5  5  and  5  G  P] 
t#(VP€r)[xeTofc(P)] 

iff  XG  I  ^  e  r} 

and  for  P  G  Typ{Evt{S)), 


X  £  Tok{T  -  P)  iff  (Is  £T-  P)[x  £  Tok{s)] 
iffx^Tok{P) 
iffx£Tok{S)--Tok{P) 

and 

X  G  Tok{T)  iff  (3s  G  ^)[x  G  Tok{s)]  x  £  Tok{±)  iff  (3s  G  l)[x  G  Tok{s)] 
iffx£Tok{S)  iffx£9 

This  shows  the  set  operations  on  T (8)  are  welLdefined  and  that  /  is  an  epimorphism.  Assume  Tok{s)  ^  0 
for  all  s  G  Typ{S)  and  for  P,Q  £  Typ{Evt{S)),  P  ^  Q.  Without  loss  of  generality,  let  s  G  P  and  s 
Since  Tok{s)  ^  0,  there  is  some  x  such  that  x  [=5  s  and  x  G  Tofc(P).  Since  [=5  is  a  function  (as  opposed 
to  a  mere  relation),  x  ^  Tok{Q)  and  therefore  /  is  1-1.  ■ 


Example  3.2.6  Suppose  there  is  a  physical  system  which  is  described  by  a  state  space  S.  Let  Tok{S)  be 
instances  of  the  system  at  various  times.  Time  need  have  no  beginning  and  end  for  the  system  although 
you  can  impose  one.  The  states  of  the  system  are  vectors  of  measurable  properties.  The  event  space 
Evt{S)  has  as  types  the  power  set  of  Typ{S)^  and  as  tokens  Tok{S)  such  that  x  \=Evts  ^  there  is 
some  s  £  E  and  states{x)  =  s.  The  proportion  of  time  the  system  spends  in  a  particular  state,  s,  is 
modeled  as  ^(s).  The  proportion  of  time  associated  with  an  event  E  is  ^{E)  which  totals  up  all  the 
time  the  system  spends  in  any  of  the  states  of  P. 

Incidently,  the  ■<  order  turns  out  to  be  preserved  by  infomorphisms,  i.e.,  they  are  monotone  maps  on 
types.  That  this  order  is  preserved  by  infomorphisms  without  any  extra  conditions  shows  that  this  order 
is  an  intrinsic  feature  for  this  category  of  classifications. 

Theorem  3.2.7  Let  h  :  A  — ^  B  be  an  infomorphism,  then  P  <Q  implies  P^ 

proof:  Assume  P  <Q  and  let  x  G  Tok{P^),  then  x  From  the  infomorphism  condition,  x^  P 

and  hence  x^  £  Tok{P),  Since  P  ^  Q,  x^  £  Tok{Q)  and  x^  \=a  Q-  From  the  infomorphism  condition 
again,  x  \=b  and  hence  x  G  Tok{Q^).  By  definition  P^  -<  Q^.  ■ 
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Dually,  one  can  define  a  preorder  on  tokens  by  extending  Typ{—)  within  a  classification  A  using 
Typ{x)  {F  I  X  \=j\  P}.  And  a  similar  theorem  will  show  that  x  ^y  implies  x^  -<y^  ior  h:  A  — >  B. 

Theorem  3.2.8  Let  h  :  A  — >  B  be  an  infomophisnij  then  h  {Tok{P^))  C  Tok{P)  and  h  {Typ{x^))  C 
Typ{x). 

The  proof  is  a  simple  application  of  the  infomorphism  condition.  In  fact,  these  two  theorems  taken 
as  axioms  completely  characterize  infomorphisms. 


4  Sequents  and  Logics 

A  sequent  represents  a  constraint  that  may  or  may  not  hold  of  a  classification.  It  is  a  logical  statement 
in  that  it  represents  a  relation  between  premises  and  conclusions.  The  premises  and  conclusion  are  sets 
of  types.  It  is  sequents  that  enable  the  flow  of  information.  The  information  flow  they  enable  is  an 
information  flow  of  reasoning.  That  said,  sequents  may  also  be  used  to  model  communication  flows 
when  the  sequents  are  modeling  communication.  A  communication  sequent  or  gate  can  be  thought  of  as 
allowing  a  token  to  flow  under  it  just  when  the  token  satisfying  the  premises  also  entails  that  the  token 
satisfy  the  conclusion. 

4.1  Sequents 

Definition  4.1.1  (Barwise— Seligman)  Let  A  be  a  classification.  A  theory  for  A  is  a  collection  of 
sequents  of  the  form: 

A 

where  F  and  A  are  collections  of  types  and  the  is  the  turnstile  of  logical  consequence. 

This  is  the  usual  notion  of  sequent.  The  types  in  F  are  thought  of  as  conjoined  together  and  the  types 
in  A  are  thought  of  as  disjoined.  The  requirement  for  a  token,  x,  to  satisfy  the  above  sequent  is: 

{for  all  P  E  F,x  [=yi  P)  implies  {there  exists  one  Q  €  A,x  \=a  Q). 

When  F  or  A  are  singleton  sets,  say,  {A},  then  A  b  A  or  F  b  A  will  be  used.  It  is  important  to  notice 
there  is  no  logical  structure  imposed  on  the  types  as  a  restriction  imposed  by  channel  theory.  They  are 
merely  types.  Any  logical  structure  could  be  imposed  as  a  result  of  attempting  to  model  some  domain 
of  discourse,  but  channel  theory  simpliciter  does  not  impose  one  itself.  To  impose  structure  on  the 
collection  of  types  means  that  that  structure  is  not  accessible  via  channel  infomorphisms  (i.e.,  at  the  level 
of  category  theory)  and  only  accessible  at  the  additional  cost  of  extra  mathematical  scaffolding.  Any 
extra  structure  would  come  about  because  some  peculiar  feature  of  a  universe  of  discourse  needed  to  be 
modeled. 

4.2  Logics 

Definition  4.2.1  (Barwise-SeligmEin)  A  local  logic  L  =  (A,  b£,,iVjc)  consists  of  a  classification  A, 
a  set  \-c  of  sequents  involving  the  types  of  Aj  and  a  subset  Nc  ^  Tok{A)  called  the  normal  tokens  of 
Lj  which  satisfy  all  the  constraints  bjc,.  A  local  logic  JC  is  sound  if  every  token  is  normal;  it  is  complete 
if  every  sequent  that  holds  of  all  normal  tokens  is  in  the  consequence  relation  he- 
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Typically,  the  sequents  are  required  to  follow  certain  structural  rules  but  these  will  not  concern 
us  in  this  paper.  The  following  two  (non-structural)  rules  allow  for  the  movement  of  logics  between 
classifications  connected  via  the  infomorphism  f  :  A  — ^  B: 

T'-f  T\-aA  Tf  \-B  Af  T\-bA 

-  /-Intro  -  /-Intro  -  /-Elim  -  /-Elim 

ri-sA  r^i-sA^  ri-^A 

where  is  a  nicer  way  of  writing  /  i.e.,  the  inverse  image  of  F  under  /  and  is  the  direct  image 

of  A  under  /.  Each  rule  has  two  forms.  /-Intro  preserves  validity,  to  wit:  assume  the  premise  and  let 
X  be  a  counter-example  to  the  conclusion.  If  f{x)  \=a  P  for  all  P  G  (vacuously  if  r"f)  =  0),  then 
xf  must  satisfy  at  least  one  Q  G  A““f.  Since  x^  \=a  Q,  then  x  (=b  <5  which  is  a  contradition  to  x 
being  a  counter-example.  /-Elim  fails  to  observe  validity  since  it  is  possible  for  a  counter-example  in  the 
conclusion  to  have  no  preimage  under  / .  Of  course,  if  /  {Tok{B))  =  Tok{A)y  then  the  rule  will  preserve 
validity.  Preservation  of  non-validity  is  exactly  the  opposite  for  the  two  rules. 

The  two  different  forms  of  the  rules  are  quite  different  because  they  are  working  on  sets.  Consider 
the  two  cases  of  /-Elim.  In  the  first,  the  types  in  P  and  A  are  types  of  A  that  have  been  mapped  to  B 
under  /.  In  the  second,  the  types  in  F  and  A  are  types  of  B  that  are  pulled  back  along  /  to  types  in  A. 

Probabilities  can  be  assigned  to  sequents.  Consider  the  simple  sequent  in  A  and  its  satisfying  condi¬ 
tion: 

P^aQ  Vx  (x  |=>i  P  implies  x  \=  a  Q)- 

To  attach  a  probability  to  this  sequent  means  to  weaken  it  so  that  it  only  only  holds  for  some  of  the 
tokens  and  fails  to  hold  the  rest.  Hence,  to  weaken  the  sequent  is  to  remove  the  universal  quantifier  and 
then  attach  a  probability  to  x  \=  a  Q  given  that  x  ^a  P  for  arbitrary  x.  What  is  the  probability  that 
X  satisfies  Q  given  that  it  satisfies  P?  This  is  a  statement  of  conditional  probability,  so  we  make  the 
following  definition 

^(Q  I  P). 

When  a  sequent’s  conditional  probability  is  known  to  be,  say  p,  then  this  will  be  indicated  by 

To  actually  use  P  a  Q  an  argument,  one  must  first  have 

X  [=^  P 

The  probability  of  this  obtaining  in  A  is  ^{P).  The  use  of  the  rule  has  the  computed  probability, 

^(P).(PI-f  Q). 

The  use  of  conditional  probability  to  interpret  h  is  similar  to  the  use  of  conditional  probability  in  [1] 
to  interpret  In  that  book,  the  use  of  =>  is  derived  from  conditional  probability.  Here,  the  b  is  a 
pre-existing  concept  which,  given  a  probabilistic  clothing,  is  a  definition  of  conditional  probability.  This 
points  out  that  h  is  not  the  same  as  the  material  conditional  of  classical  logic  and  in  fact,  has  no  proof 
theoretic  character  in  channel  theory  unless  provided  with  a  supporting  cast  which  includes  a  formal 
system. 

Channel  theory  has  sequents  of  the  form  F  \-a  A  for  a  classification  A.  To  use  a  sequent  of  this  form, 
^  will  need  to  be  extended  to  cover  the  case  of  sequents  for  the  following  calculation: 

^(r).(ri-f  A). 
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For  a  token  to  satisfy  F,  it  must  satisfy  every  element  of  F  and  hence  F  is  thought  of  conjunctively.  In 
probability  theory,  this  is  generally  not  a  problem  since  F  could  be  equated  to  /\F  and  Typ{A)  would 
actually  be  a  Borel  lattice  of  sets  with  countable  meets  and  joins.  Space  prevents  us  from  a  complete 
exposition  here  but  the  trick  is  to  transfer  ^  from  Typ{A)  to  the  tokens  set  {Tok{P)  \  P  €  Typ{A)}, 
As  a  set  of  sets,  a  semilattice  can  be  generated  isomorphic  to  the  free  meet  semilattice  using  Typ{A) 
as  generators.  This  is  essentially  the  intersection  semilattice  generated  by  {Tok{P)  |  P  G  Typ{A)},  A 
similar  construction  must  be  done  for  the  complete  join  semilattice  to  evaluate  ^(A  |  F).  However,  a 
further  quotient  join  semilattice  must  be  constructed  by  dividing  the  join  semilattice  with  a  collection  of 
equalities  of  the  form  P  =  1  for  all  P  G  F  where  1  is  the  top  of  the  join  semilattice. 

A  good  probability  function  is  any  that  satisfies  the  axioms  (^5)  now  transferred  to  {Tok{P)  |  P  G 

Typ{A)}  and  is  defined  over  the  newly  introduced  semilattices.  Space  prevents  us  from  a  fuller  exposition 
here  and  the  sequents  used  in  examples  in  the  sequel  will  be  only  of  the  form  P  Q  or  P,  P'  Q. 


5  Information  Channels 

An  information  channel  is  a  classification  used  to  connect  other  classifications  where  the  connections  are 
infomorphisms.  It  is  information  channels  that  support  information  flow  by  means  of  sequents.  One  might 
think  that  the  notion  of  a  “channel”  should  be  captured  by  an  infomorphism.  An  information  channel 
in  the  binary  case  (where  two  classifications  are  being  connected)  is  a  two-way  channel.  An  information 
channel  supports  the  form  of  distributed  reasoning  where  one  can  think  of  the  reasoning  as  moving  along 
the  channel.  This  is  an  entirely  abstract  concept  which,  given  some  restrictions,  has  communication 
channels  as  concrete  instances. 


5.1  Basic  Definitions 


Definition  5.1.1  (Barwise— Seligman)  An  information  channel  is  consists  of  an  indexed  family 
C  =  {fi  :  Ai  — ^  C}  of  infomorphisms  with  a  common  codomain  C  called  the  core  of  the  channel. 
Diagrammatically 


Frequently  in  the  sequel,  the  term  channel  will  be  (mis)used  to  refer  to  the  core  of  the  channel.  This  is 
for  mere  expediency  and  the  reader  is  asked  to  be  forgiving.  There  is  never  any  question  as  to  which 
morphisms  are  involved. 


Example  5.1.2  Let  C  model  a  single  user  sending  messages  to  two  different  people,  Alice,  modeled  by 
A  and  Eve,  modeled  by  E.  C  is  to  be  a  channel  between  A  and  E  but  notice  this  is  not  a  communication 
channel  since  neither  Alice  nor  Eve  are  sending  messages  to  each  other.  The  tokens  are  the  individual 
mail  messages  with  Alice  and  Eve  both  mentioned  as  recipients  in  the  message  headers.  The  types  are 
facts  about  those  mail  messages.  Let  A,  C,  and  E  all  share  the  same  types  and  the  same  tokens.  The 
channel  diagram  is 

A-^C^^E 


Alice  can  reason  about  what  Eve  knows  by  reading  the  mail  messages  and  noticing  that  the  same  messages 
were  sent  to  Eve.  Eve  can  do  likewise,  hence  this  is  a  bi-directional  information  channel.  In  channel 
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theoretic  terms,  Alice  reasons  by  seeing  if  a  token  satisfies  sequents  of  the  form  T  \~a  A.  Since  all  the 
infomorphisms  are  the  identity  morphism,  Alice  knows  that  F  he  A  holds  in  the  channel  and  that  F  hjj  A 
holds  for  Eve. 

The  above  analysis  points  out  that  the  channels  of  channel  theory  are  (in  general)  bidirectional.  The 
reason  is  they  present  us  with  ways  of  stating  properties  of  the  information  of  the  channel,  and  those 
properties  are  entirely  determined  by  the  outside  environment,  either  by  ourselves  by  fiat  (convention) 
or  by  physical  attributes.  These  properties  are  then  formalized  as  the  types  of  the  channel.  The  example 
of  current  in  a  wire  is  a  good  example.  It  is  only  by  stipulation  that  current  goes  in  one  direction  when 
in  fact  it  can  be  looked  at  as  bidirectional  for  positive  and  negative  charge. 


Example  5.1.3  Consider  the  case  of  an  initiator  of  communication,  /,  and  a  receiver,  R  with  a  binary 
channel,  IR,  between  them  meant  to  represent  a  communication  channel: 


Let  i  :  Tok{IR)  > - >  Tok{I)  and  V  :  Tok{IR)  > - ^  Tok{R),  i.e.,  i  and  r  are  like  projection  maps 

except  that  they  inject  tokens  from  the  channel  into  the  token  sets  of  I  and  R,  The  injections  model 
that  /  only  sends  part  of  Tok{I)  and  R  only  receives  part  of  Tofc(il). 

Let  a  sequent  in  an  information  channel  representing  a  communication  channel  be  called  a  gate.  It 
is  tempting  to  view  the  classification  structure  (on  the  left  above)  as  a  mathematical  description  of  the 
(intuitive  view)  of  a  communication  channel  (on  the  right)  where  each  o  in  the  respective  classifications 
is  a  tuple  of  the  \=  relation  specific  to  that  classification.  The  tuples  are  the  information  that  is  produced 
at  /,  travel  through  the  IR  via  one  of  the  routes,  and  arriving  at  R.  Each  route  is  mediated  by  a  gate 
(sequent)  to  which  a  probability  will  be  assigned. 

This  second  diagram  is  misleading  in  the  sense  that  information  tuples  do  not  actually  move  in  the 
classification  scheme.  Instead,  there  are  static  mathematical  relationships  which  relate  tuples  of  the 
classification  I  to  those  of  IR,  and  similarly,  tuples  of  R  to  those  of  IR.  It  is  our  external  claim  that 
the  mathematics  models  the  communication  channel.  Now  that  there  is  a  mathematical  model,  however, 
it  can  be  tested  to  see  with  what  degree  of  fidelity  it  models  the  real  situation. 

The  initiator  J  is  intending  to  send  not  simply  a  message  m*  but  instead  the  tuple  (m^,  C)  e|=7  since 
this  is  the  basic  unit  of  currency  in  channel  theory.  It  is  the  image,  m%  under  infomorphism  i  of  channel 
message  m  that  I  is  sending.  If  there  is  no  C  for  which  m'  |=/  C,  then  in  effect  there  is  nothing  / 
can  say  about  mL  The  communication  can  still  take  place,  but  nothing  can  be  said  about  actual  value 
transferred. 

It  is  Fs  intention  that  the  fact  of  f=/  C  be  communicated  to  R.  Assuming  no  loss  of  information 
for  the  signal,  this  requires  that  I  and  R  agree  on  the  types  used  for  communication  purposes.  The  sense 
of  the  communication  is  then 

C  iff  m  \=jR  infomorphism  condition 

implies  m  (=/«  ? 

iff  rri^  C  infomorphism  condition 

where  ?  indicates  a  missing  reason  (supplied  below).  Clearly,  this  should  be  the  case  for  all  types  C. 
Necessarily,  I  and  R  must  have  agreed  on  channel  sequents  for  these  (but  not  all)  types.  Suppose  there  are 
no  channel  sequents.  It  is  possible  for  C'  for  some  C'.  One  could  hardly  say  that  communication 
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has  taken  place  because  C"  has  no  connection  with  I.  The  relationship  [=7^  C'  is  spurious  or  accidental 
and  R  can  get  no  information  about  I  from  it. 

Note  that  one  cannot  even  tell  in  which  direction  the  communication  is  taking  place.  To  actually  say 
something  about  the  communication,  one  must  classify  precisely  what  is  to  be  said.  Let  there  be  types 
Src  e  Typ{J)  and  Dst  e  Typ{R)  such  that  for  all  tokens  x  £  Tok{IR)^ 

x^  \=j  Src  x^  [=ji  Dst. 

Now,  the  channel  models  a  direction  via  the  stipulated  conditions  on  infomophisms,  i.e., 

a:*  [=/  Src  iff  x  Src^  x^  \=ji  Dst  iff  x  Dst^, 

and  with  channel  tokens  satisfying  the  following  gate  on  the  left  via  the  condition  on  the  right: 

Src^  hjR  DsV'  for  all  z  6  Tok{IR)^z  \=ir  Srd  implies  z  |=jij  Dst^. 

This  gate  supplies  the  missing  condition  (?)  above  for  ^  Src^  and  =  Dsf.  This  stipulated  direction 
through  the  channel  appears  artificial  but  it  is  also  echoed  in  information  transfer  in  Shannon's  theory. 
There,  all  one  has  is  measurement  of  the  information  that  was  transferred.  Prom  the  measurements  alone, 
it  is  impossible  to  tell  the  direction  of  information  flow. 

Definition  5.1.4  (Barwise-Seligman)  A  distributed  system  A  consists  of  an  indexed  family  cla{A)  ~ 
{Ai}i^j  of  classifications  together  with  a  set  inf  (A)  of  infomorphisms  all  having  both  domain  and 
codomain  cla{A). 

A  distributed  system  is  simply  a  collection  of  classifications  and  some  infomorphisms  between  some 
of  the  classifications.  Prom  Barr  in  [2]  reporting  on  the  work  of  his  graduate  student  Chu,  it  is  clear  that 
categories  of  classifications  have  colimits.  A  colimit  of  a  distributed  system  is  a  minimal  channel  amongst 
all  the  channels,  each  channel  connecting  the  entire  distributed  system.  To  be  a  channel  for  a  distributed 
system  is  to  cover  the  system.  An  analogous  concept  in  partial  orders  is  that  of  an  upper  bound  (think 
of  classifications  as  points  and  infomorphisms  as  elements  of  the  partial  order  relation),  a  colimit  would 
be  a  least  upper  bound. 

Definition  5.1.5  (Bairwise-Seligman)  A  channel  C  =  {hi  :  Ai  — C}i^j  covers  a  distributed  system 
A  if  for  each  i,j  €  /,  and  each  infomorphism  f  :  Ai  — ^  Aj  in  inf  (A),  the  following  diagram  commutes: 


C 


C  is  a  minmal  cover  of  a  distributed  system  A  if  it  covers  A  and,  for  every  other  channel  T>  (with  core 
D )  covering  A,  there  is  a  unique  infomorphism  from  C  to  V. 

Theorem  5.1.6  (Chu)  Every  distributed  system  has  a  minimal  cover,  and  it  is  unique  up  to  isomor¬ 
phism. 
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Example  5.1.7  An  infomorphism  is  a  restricted  form  of  a  channel.  This  construction  of  a  channel  from 
an  infomorphism  is  instructive  in  that  it  shows  how  much  more  freedom  there  is  in  the  notion  of  an 
information  channel.  Let  /  :  A  — >  J3  be  an  infomorphism.  The  intuitive  idea  is  to  represent  /  in  its 
graph  form  and  /  ais  quotient  on  the  disjoint  union  of  Typ{A)  and  Typ{B).  The  colimit  of  this  as  a 
distributed  system  has  the  following  intuitive  diagram  on  the  left  specifying  the  conditions  on  the  right: 


C 

iri  /  Jr2 


TW  =  7(^2({y,x)))  =  Vi((j/,a;))  =  y, 

^i(7(Q))  =  ^2(Q). 


for  all  types  Q.  This  simply  assures  a  type  Q  from  A  is  sent  to  the  same  type  as  /  (Q)  when  both  are 
injected  into  the  channel  C. 

5.2  Modeling  Communication 

We  now  study  an  example  from  [6],  where  a  standard  Shannon-type  analysis  was  done  of  a  covert  channel. 
We  show  how  our  new  framework  extends  the  classical  analysis.  The  scenario  is  simple:  there  are  two 
users,  Alice  and  Clueless,  inside  of  a  private  enclave.  Alice  and  Clueless  have  no  knowledge  of  what  the 
other  is  doing.  The  users  may  transmit  no  message  or  one  message  per  unit  time  to  a  second  enclave. 
The  transmissions  between  enclaves  are  encrypted  and  all  messages  appear  the  same  to  an  eavesdropper 
Eve.  The  only  thing  that  Eve  can  do  is  count  the  number  of  messages  (per  unit  time)  going  from  the  first 
enclave  (that  of  Alice  and  Clueless)  to  the  second  enclave.  Therefore  Eve  sees  zero,  one,  or  two  messages 
per  unit  time.  Alice  uses  this  scenario  to  covertly  communicate  with  Eve.  Alice  will  attempt  to  send  a 
bit  to  Eve  per  unit  time  interval.  This  is  the  most  that  Alice  can  send  because  Alice  only  has  two  actions. 
The  actions  of  Clueless  act  as  noise  in  the  covert  channel. 

Alice  will  send  a  0  by  not  sending  a  message.  If  Alice  sends  a  0  and  Clueless  does  not  transmit,  then 
Eve  receives  a  0.  Alice  will  send  a  1  by  sending  a  message.  If  Alice  sends  a  1  and  Clueless  does  not 
transmit,  then  Eve  receives  a  1.  If  Alice  sends  a  1  and  Clueless  does  transmit,  then  Eve  receives  a  2. 
Therefore,  Eve  is  only  certain  of  Alice’s  transmission  if  Eve  receives  a  0  or  a  2.  The  received  s3anbol  1 
is  a  noisy  s3mibol.  In  the  following  matrix,  xi  represents  the  actions  of  Alice,  xa  the  actions  of  Clueless, 
and  X3  the  symbols  that  Eve  receives.  The  time  is  in  discrete,  integral  ticks. 

Consider  the  following  classification  diagram  (on  the  left)  of  the  communication  channel 

xi  xa  X3  t 
/  0  0  0  ti\ 

1  0  1  fa 

0  1  1  <3 

\  1  1  2  <4  / 

Tokens  in  the  channel  are  of  the  form  (xi,xa,X3,t)  where  the  allowable  values  of  the  combinations  of  Xj 
and  tick  time  in  the  tuples  are  recorded  in  the  matrix  above  {t  is  a  natural  number).  For  x  =  (xi ,  xa,  X3,  t), 
*a{x)  —  (xi,t),  *c{x)  —  (xa,t),  and  *e(x)  =  {x3,t).  Types  for  component  classifications  A  and  C  are 
{0, 1}  and  the  types  for  E  are  {0, 1,2}.  These  types  are  injected  into  the  channel  (where  the  superscript 
indicates  which  infomorphism  did  the  injection).  The  channel  gates,  labeled  with  gu  and  their  respective 


ACE 


ACE 
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conditional  probabilities  are  the  following: 

Pi:  92-  OM^  W|0",0^)  =  1  ^(r|0",r)  =  l 

Ps:  94:  1\1^\-ace2^  ^(1^  I  1“,0")-1  ^{2^\1^,V)  =  1 

Each  gate  transfers  information  with  probability  1.  That  is,  for  every  token  in  the  channel,  if  the  left 
hand  side  of  the  gate  is  satisfied,  the  right  hand  side  is  satisfied.  The  channel  connecting  A,  C,  and  E  is 
taken  from  a  global  perspective.  To  model  the  system  from  the  more  local  perspective  of  only  Alice  and 
Eve,  the  types  injected  by  Clueless  must  be  ignored.  Consider  an  infomorphism  k  from  a  new  channel  to 
ACE: 


ACE  ^ - - - ACE 


ACE  ACE 

where  C'  is  has  lost  the  types  0  and  1  and  unable  to  inject  them  into  the  channel  AC' E,  The  morphism 
k  is  stipulated  to  be  the  identity  on  Tok{ACE)  and  an  injection  on  Typ{AC'E).  In  general,  for  the 
infomorphism  /  :  X  - — ^  Y,  the  rules 

hy  r  by  A 

-  /-Elim  -  /-Elim 

r  hx  A  hx  A~/ 

do  not  preserve  validity  (as  previously  noted).  However,  they  fail  to  do  so  for  very  different  reasons.  The 
first  form  is  from  Barwise/Seligman  [3],  the  second  form  is  new.  The  first  fails  because  there  can  easily 
be  tokens  of  X  which  fail  the  conclusion,  but  they  will  not  be  of  the  form  f{x)  for  x  €  Tok{Y).  The 
second  form  (ian  fail  because  not  every  type  of  Y  need  be  a  type  of  X.  Hence,  even  if  Tok{X)  =  Tok{Y), 
tokens  that  inadvertently  satisfied  F  by  A  by  failing  to  satisfy  all  types  in  F  might  easily  satisfy  all  types 
in  F^-^  simply  because  the  use  of  sets  and  functions  only  guarantee  that  C  F,  not  =  F 

Consider  the  following  use  of  the  second  form  of  fc-Elim 

0^0^b^c^:0^ 

-  fc-Elim 

The  conclusion  of  the  rule  does  not  hold  because  a  token  of  the  form  (0, 1, 1,  t)  is  a  counter-example  to  the 
conclusion  whereas  the  premise  is  a  valid  gate  in  ACE,  The  normal  token  (0, 0, 0,  t)  of  ACE  will  hold  of 
the  conclusion,  however  this  cannot  be  considered  a  normal  token  of  AC'E  since  it  is  a  counter-example 
to  the  conclusion  of  another  use  of  A:-Elim  (see  p2  below).  It  is  but  a  short  step  to  assign  a  probability  to 
the  conclusions  of  the  four  uses  of  this  rule,  namely  the  gates  on  left  below  and  summarized  compactly 
in  a  channel  matrix  (identical  to  that  shown  in  [6])  on  the  right: 

Pi*  ^^C'E  92’  ^AC'E 

9'z-  g'i- 

by  using  the  proportion  of  tokens  which  are  normal  (for  each  gate  alone)  to  the  total  number  of  normal 
and  non-normal  tokens  (for  each  gate  alone).  Incidently,  in  [6],  it  is  shown  that  p  =  a  and  q  =  P  for  this 
example  due  to  the  way  Clueless  acts. 


QC  le  2^ 

Q-fp  q  0\ 

1“  V  0  a  p  J 
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5.3  Modeling  Diagram  Transmission 

There  is  an  interesting  twist  on  the  previous  example  which  was  brought  up  in  [5]  in  connection  with 
steganography  .  The  details  are  changed  here  due  to  space  consideration.  Suppose  a  bitmap  of  a  picture 
is  to  be  sent  and  the  picture  contains  the  diagram  of  the  numeral  1.  Suppose  further  that  there  is  a 
noise  producer  similar  to  the  previous  example  in  that  it  flips  bits  in  according  to  a  uniform  random 
distribution.  The  question  is,  how  is  it  that  even  with  moderate  amounts  of  noise  and  reduced  channel 
capacity,  the  1  is  still  able  to  be  received  and  recognized  as  a  1.  Informationally,  the  noise,  unless  it  rises 
high  enough,  does  nothing  to  degrade  the  information  being  sent. 

There  are  several  different  ways  of  modeling  the  situation  in  channel  theory.  Essentially,  they  all 
reduce  to  there  being  another  channel  involved  that  is  derived  from  and  in  addition  to  the  existing 
communication  channel.  Specifically,  assume  the  channel  ACE  from  the  previous  example,  except  that 
here  Alice  A  is  now  sending  the  bits  of  the  picture,  C  is  the  noise  producer  and  Eve  E  is  receiving  the 
picture.  There  is  another  channel  with  gates: 


ACE 


Ads 


9i:  O^F-h^O' 
P3: 


92:  F,F^h25^F 
^3:  r,F^h;jggO^ 


This  is  a  derived  channel  where  Tok{A)  =  V{Tok[A)),  Tok{E)  =  V{Tok{E)),  wdTok{C)  —  {/}  for  / 
the  noise  producing  function  defined  such  that  for  token  {X,f,Y)  with  X  €  Tok{A)  and  Y  €  Tok{E), 
fn{X)  =  Y.  In  short,  the  token  sets  are  up  one  set  theoretical  type  level  from  the  token  sets  of  the 
originating  classification.  The  types  for  A  and  E  are  I  for  the  diagram  of  1  and  O  for  no  diagram  of  1. 
Similarly  to  the  previous  example,  F  represents  the  type  I  injected  into  the  channel  from  Typ{A).  The 
lone  type  of  C  is  F.  ^ 

For  classification  A,  let  X  1=;^  I  just  when  X  appears  as  a  picture  of  the  diagram  of  1  and  A"  |=^  O 
otherwise.  X  necessarily  includes  some  smrounding  pixels  so  that  that  the  1  may  be  discerned  from  the 
background.  The  situation  is  similar  for  E.  For  noise  producer  C,  fn  he 

Every  token  in  the  channel  satisfies  one  of  the  gates  with  the  interpretation  that  the  token  F  just 
when  Alice  thinks  it  looks  like  a  1  and  satisfies  F  just  when  Eve  thinks  it  looks  like  a  1.  The  token 
satisfies  the  respective  O  types  otherwise.  Every  token  satisfies  F"^  because  the  image  of  every  token  in 
Tok{C)  under  the  infomorphism  c  satisfies  F  in  C. 

The  net  result  is  that  F^  is  parametric  to  the  gates  since  every  token  satisfies  it  and  it  appears  in 
every  gate.  Hence  forming  the  infomorphism  fc,  similar  to  k  in  the  previous  example,  does  not  cause  any 
probabilities  to  crop  up.  Either  the  sent  1  looks  like  a  1  to  Eve  or  it  does  not.  Similarly  for  sending  no 
picture  of  a  diagram  of  1. 
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